WHAT IS CLAIMED IS: 



1 1. A processing core comprising: 

2 one or more processing pipelines having a total of N-number of processing 

3 paths, each of said processing paths for processing instructions on M-bit data words; and 

4 a plurality of register files, each having Q-number of registers, said Q-number 

5 of registers being M-bits wide; 

6 wherein said Q-number of registers within each of said plurality of register 

7 files are either private or global registers, and wherein when a value is written to one of said 



8 Q-number of said registers which is a global register within one of said plurality of register 

9 files, said value is propagated to a corresponding global register in the other of said plurality 

10 of register files, and wherein when a value is written to one of said Q-number of said registers 

1 1 which is a private register within one of said plurality of register files, said value is not 

12 propagated to a corresponding register in the other of said plurality of register files. 



1 2. The processing core as recited in claim 1 , wherein every two of said N- 

2 number of processing paths share one of said plurality of register files. 

1 3 . The processing core as recited in claim 1 , wherein a processing 

2 instruction comprises N-number of P-bit instructions appended together to form a very long 

3 instruction word (VLIW), and said N-number of processing paths process N-number of P-bit 

4 instructions in parallel. 

1 4. The processor chip as recited in claim 3, wherein M=64, Q=64, and 

2 P=32. 

1 5. The processing core as recited in claim 1 , wherein said processing 

2 pipeline comprises an execute stage which includes an execute unit for each of said N- 

3 number of M-bit processing paths, each of said execute units comprising an integer 

4 processing unit, a load/store processing unit, a floating point processing unit, or any 

5 combination of one or more of said integer processing units, said load/store processing units, 

6 and said floating point processing units. 

1 6. The processing core as recited in claim 5, wherein an integer 

2 processing unit and a floating point processing unit share one of said plurality of register 

3 files. 
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1 7. The processing core as recited in claim 1, wherein Q=64, and a 64-bit 

2 special register stores bits indicating whether a register in a register file is a private register or 

3 a global register, each bit in the 64-bit special register corresponding to one of said registers 

4 in said register file. 

1 8. The processing core as recited in claim 1, wherein each of said 

2 plurality of register files is connected to a bus, and a value written to a global register in one 

3 of said plurality of register files is propagated to a corresponding global register in the other 

4 of said plurality of register files across said bus. 

1 9. The processing core as recited in claim 1 , wherein said plurality of 

2 register files are connected together in serial, and a value written to a first global register in a 

3 first of said plurality of register files is propagated to a corresponding first global register in a 

4 second of said plurality of register files connected directly to said first of said plurality of 

5 register files. 

1 10. A VLIW processing core comprising: 

2 one or more processing pipelines each including a fetch stage, a decode stage, 

3 an execute stage, and a write-back stage, said execute stage having an execute unit 

4 comprising an integer processing unit, a load/store processing unit, a floating point 

5 processing unit, or any combination of one or more of said integer processing units, said 

6 load/store processing units, and said floating point processing units; and 

7 a register file for each of said one or more processing pipelines; 

8 wherein an integer processing unit and a floating point processing unit within 

9 said one or more processing pipelines both access said register file. 

1 1 1 . In a computer system, a scalable computer processing architecture, 

2 comprising: 

3 one or more processor chips, each comprising: 

4 a processing core, including: 

5 a processing pipeline having N-number of processing paths, each of said 

6 processing paths for processing instructions on M-bit data words; and 

7 a plurality of register files, each having Q-number of registers, said Q-number 

8 of registers being M-bits wide; 
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9 an I/O link configured to communicate with other of said one or more 

1 0 processor chips or with I/O devices; 

11 a communication controller in electrical communication with said processing 

12 core and said I/O link; 

13 said communication controller for controlling the exchange of data between a 

14 first one of said one or more processor chips and said other of said one or more processor 

15 chips; 

1 6 wherein said computer processing architecture can be scaled larger by 

1 7 connecting together two or more of said processor chips in parallel via said I/O links of said 

18 processor chips, so as to create multiple processing core pipelines which share data 

19 therebetween. 

1 12. The computer processing architecture as recited in claim 1 1, wherein 

2 in said processing core of each of said processor chips, every two of said N-number of 

3 processing paths share one of said plurality of register files. 

1 13. The computer processing architecture as recited in claim 1 1 , wherein a 

2 processing instruction comprises N-number of P-bit instructions appended together to form a 

3 very long instruction word (VLIW), and said N-number of processing paths process N- 

4 number of P-bit instructions in parallel. 

1 14. The computer processing architecture as recited in claim 13, wherein 

2 M=64, Q=64, and P=32. 

1 15. The computer processing architecture as recited in claim 1 1 , wherein 

2 said processing pipeline comprises an execute stage which includes an execute unit for each 

3 of said N-number of M-bit processing paths, each of said execute units comprising an integer 

4 processing unit, a load/store processing unit, a floating point processing unit, or any 

5 combination of one or more of said integer processing units, said load/store processing units, 

6 and said floating point processing units. 

1 16. The computer processing architecture as recited in claim 15, wherein 

2 an integer processing unit and a floating point processing unit share one of said plurality of 

3 register files. 
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1 17. The computer processing architecture as recited in claim 11, wherein 

2 said Q-number of registers within each of said plurality of register files are either private or 

3 global registers, and wherein when a value is written to one of said Q-number of said 

4 registers which is a global register within one of said plurality of register files, said value is 

5 propagated to a corresponding global register in the other of said plurality of register files, 

6 and wherein when a value is written to one of said Q-number of said registers which is a 

7 private register within one of said plurality of register files, said value is not propagated to a 

8 corresponding register in the other of said plurality of register files. 

1 18. The computer processing architecture as recited in claim 17, wherein 

2 Q=64, and a 64-bit special register stores bits indicating whether a register in a register file is 

3 a private register or a global register, each bit in the 64-bit special register corresponding to 

4 one of said registers in said register file. 

1 19. The computer processing architecture as recited in claim 17, wherein 

2 each of said plurality of register files is connected to a bus, and a value written to a global 

3 register in one of said plurality of register files is propagated to a corresponding global 

4 register in the other of said plurality of register files across said bus. 

1 20. The computer processing architecture as recited in claim 19, wherein 

2 said plurality of register files are connected together in serial, and a value written to a first 

3 global register in a first of said plurality of register files is propagated to a corresponding first 

4 global register in a second of said plurality of register files connected directly to said first of 

5 said plurality of register files. 
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